{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Development\n",
    "\n",
    "We are ready to create a Diabetes model (using PyTorch) which will predict whether or not a patient has diabetes based on current medical readings. \n",
    "\n",
    "To start we will need to import our required libraries and packages.  We will load our diabetes data set, create test and training sets and then start developing our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Diabetes model using PyTorch\n",
    "# Uses the data file:  diabetes.csv\n",
    "import pandas as pd\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import accuracy_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After we have imported our required lbraries and packages we load the data into a dataframe (df)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#This code assumes that the file is in a data folder up one dir from this file.\n",
    "data_file_name     = '../data/diabetes.csv'\n",
    "model_saved_name   = '../model/PytorchDiabetesModel.pt'\n",
    "\n",
    "df                 = pd.read_csv(data_file_name)\n",
    "X                  = df.drop('Outcome' , axis = 1) #independent Feature\n",
    "y                  = df['Outcome']                 #dependent Feature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can train the model, we need to divide the data into 'training' and 'testing' datasets.  We will use sklearn's train_test_split method to split the dataset into random train and test subsets.\n",
    "\n",
    "Once we have done this, we create tensors.  Tensors are specialized data structures that are similar to arrays and matrices.  In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model's parameters.  Below we are initializing the tensors directly from the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train,X_test,y_train,y_test = train_test_split(X,y , test_size =0.2,random_state=0)\n",
    "\n",
    "# Creating Tensors (multidimensional matrix) x-input data  y-output data\n",
    "X_train           = torch.FloatTensor(X_train.values)\n",
    "X_test            = torch.FloatTensor(X_test.values)\n",
    "y_train           = torch.LongTensor(y_train.values)\n",
    "y_test            = torch.LongTensor(y_test.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can create our model.  We will need to eventually create a python file to house our model and api code.  Therefore let's put our model into a class called \"ANN_model\" which we can re-use later in our Python (.py) file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ANN_model(nn.Module):\n",
    "    def __init__(self,input_features=8,hidden1=20, hidden2=10,out_features=2):\n",
    "        super().__init__()\n",
    "        self.f_connected1 = nn.Linear(input_features,hidden1)\n",
    "        self.f_connected2 = nn.Linear(hidden1,hidden2)\n",
    "        self.out          = nn.Linear(hidden2,out_features)\n",
    "        \n",
    "    def forward(self,x):\n",
    "        x = F.relu(self.f_connected1(x))\n",
    "        x = F.relu(self.f_connected2(x))\n",
    "        x = self.out(x)\n",
    "        return x\n",
    "\n",
    "    def save(self, model_path):\n",
    "        torch.save(model.state_dict(), model_path)\n",
    "\n",
    "    def load(self, model_path):\n",
    "        self.load_state_dict(torch.load(model_path))\n",
    "        self.eval()\n",
    "\n",
    "torch.manual_seed(20)\n",
    "model = ANN_model()\n",
    "\n",
    "\n",
    "# Backward Propagation - loss and optimizer\n",
    "loss_function = nn.CrossEntropyLoss()   #CrossEntropyLoss also used in Tensorflow\n",
    "optimizer     = torch.optim.Adam(model.parameters(),lr=0.01)  #note Tensorflow also uses Adam\n",
    "\n",
    "epochs        =500\n",
    "final_losses  =[]\n",
    "for i in range(epochs):\n",
    "    i      = i+1\n",
    "    y_pred = model.forward(X_train)\n",
    "    loss   = loss_function(y_pred,y_train)\n",
    "    final_losses.append(loss)\n",
    "    #if i % 10 == 1:\n",
    "    #    print(\"Epoch number: {} and the loss : {}\".format(i,loss.item()))\n",
    "    optimizer.zero_grad()\n",
    "    loss.backward()\n",
    "    optimizer.step()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once our model is created we should test the model's accuracy.  We can do this by comparing the results from the test data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Accuracy - comparing the results from the test data\n",
    "\n",
    "predictions = []\n",
    "with torch.no_grad():\n",
    "    for i,data in enumerate(X_test):\n",
    "        y_pred = model(data)\n",
    "        #print(\"y_pred: {}  argmax: {}   item: {}\".format(y_pred, y_pred.argmax(), y_pred.argmax().item()))\n",
    "        #predictions.append(y_pred.argmax().item())\n",
    "        predictions.append(y_pred.argmax())\n",
    "        \n",
    "score = accuracy_score(y_test , predictions)  # Simply calculates number of hits / length of y_test\n",
    "print(score)\n",
    "\n",
    "# save model\n",
    "# for more information on saving and loading PyTorch models: https://pytorch.org/tutorials/beginner/saving_loading_models.html\n",
    "# we are saving the model as a 'pt'.  Another file format we could use is a Pickle file.  Following article describes this process\n",
    "# https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/\n",
    "\n",
    "model.save(model_saved_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that our model has an accuracy of 78%  This is a good time to discuss Normalization of data.  Normalization is the process of rescaling numeric values into a 0-1 range.  Data normalization will make model training less sensitive to the scale of features hence allowing our model to converge to better weights.  This in turn leads to our model being more accurate. \n",
    "\n",
    "We will not be normalizing the data in this learning path.  In the next learning path 'How to Deploy a PyTorch model' we will work on model preparation which include the normalization of data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have built a model, let's test it with some data from 2 patients:  one patient with diabetes and one patient without diabetes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load model and predict using class members\n",
    "ann_model = ANN_model()\n",
    "ann_model.load(model_saved_name)\n",
    "\n",
    "# Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age\n",
    "predict_data        = [6, 110.0, 65.0, 15.0, 1.0, 45.7, 0.627, 50.0] # has diabetes\n",
    "#predict_data       = [0, 88.0, 60.0, 35.0, 1.0, 45.7, 0.27, 20.0] # no diabetes\n",
    "predict_data_tensor = torch.tensor(predict_data)  #Convert input array to tensor\n",
    "prediction_value    = ann_model(predict_data_tensor)  # This is a tensor\n",
    "\n",
    "# Dict for textual display of prediction\n",
    "outcomes            = {0: 'No diabetes',1:'Diabetes Predicted'}\n",
    "\n",
    "# From the prediction tensor, get the index of the max value ( Either 0 or 1)\n",
    "prediction_index   = prediction_value.argmax().item()\n",
    "\n",
    "print(outcomes[prediction_index])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make our model testing easier, let's create a prediction function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#create prediction function\n",
    "\n",
    "def predict(dataset):\n",
    "    predict_data = dataset\n",
    "    predict_data_tensor = torch.tensor(predict_data)      #Convert input array to tensor\n",
    "    prediction_value    = ann_model(predict_data_tensor)  # This is a tensor\n",
    "\n",
    "    # Dict for textual display of prediction\n",
    "    outcomes            = {0: 'No diabetes',1:'Diabetes Predicted'}\n",
    "\n",
    "    # From the prediction tensor, get the index of the max value ( Either 0 or 1)\n",
    "    prediction_index   = prediction_value.argmax().item()\n",
    "    prediction = outcomes[prediction_index]\n",
    "    return prediction\n",
    "\n",
    "#test our prediction function\n",
    "dataset = [6.0, 110.0, 65.0, 15.0, 1.0, 45.7, 0.627, 50.0] #has diabetes\n",
    "#dataset       = [0, 88.0, 60.0, 35.0, 1.0, 45.7, 0.27, 20.0] # no diabetes\n",
    "\n",
    "diabetes_prediction = predict(dataset)\n",
    "print(diabetes_prediction)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After we make a prediction we should save the model.  The following python code saves the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save model\n",
    "model.save(model_saved_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
